Skip to content

fix(cli): surface deploy remote output, write the audit log, seed flat-alias ssh config (#2957)#3166

Merged
bpamiri merged 1 commit into
developfrom
peter/deploy-w3-observability
Jun 12, 2026
Merged

fix(cli): surface deploy remote output, write the audit log, seed flat-alias ssh config (#2957)#3166
bpamiri merged 1 commit into
developfrom
peter/deploy-w3-observability

Conversation

@bpamiri

@bpamiri bpamiri commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Wave 3 of the #2957 deploy-orchestration campaign: observability (DEP-6a/6b) + flat-alias ssh config (DEP-7). Refs #2957.

Closes these items from the 2026-06-12 re-scope checklist on #2957:

  • DEP-6a Surface remote stdout from read verbs (audit, details, app logs/details, lock status) — dispatch closures currently drop ssh.run() results (DeployMainCli.cfc:447-449); live mode returns only host-count summaries.
  • DEP-6b Wire AuditorCommands.record() (zero call sites) into deploy/rollback/setup flows so /tmp/kamal-audit.log is actually written before audit tails it.
  • DEP-7 Flat bootstrap/exec aliases must build the pool via $deployBuildSshPool(opts.configPath) (Module.cfc:2419-2421, :2433-2435 use bare new SshPool()), matching the nested server branch — these aliases are the only CLI-reachable form (wheels deploy server bootstrap is interpreted as LuCLI server command instead of deploy subcommand #2677).

What changed

DEP-6a — read verbs surface remote output. New $dispatchCollect (DeployMainCli + DeployAppCli, mirrored; $dispatchAnyCollect on DeployLockCli) runs read commands via SshPool.sequential/onAny, collects result.stdout host-prefixed ([host] line), and appends it to the live-mode summary. Stderr is the fallback when a tolerated (allowFail) command produced no stdout — so lock status on an un-held lock shows readlink's "No such file" diagnostic instead of nothing. Covered verbs: audit, details (app + proxy + accessories), app logs/details/containers/images, lock status. Sequential execution keeps output order stable and the collection single-threaded (no shared-array races on the real pool's executor). Dry-run behavior unchanged.

DEP-6b — the audit log is actually written. $deploy() now brackets the work with auditor.record("started deploy of version X") (after lock acquire, before pull) and record("completed deploy of version X") (after the role loop, before lock release); rollback() records rolled back to version X; setup() records booted accessory <name> per accessory on its hosts. All audit dispatches pass allowFail=true — observability can never fail a deploy (spec pins a read-only-fs record failure not aborting the deploy).

DEP-7 — flat aliases honor ssh: config. case "bootstrap" / case "exec" in Module.cfc now construct DeployServerCli with $deployBuildSshPool(opts.configPath) (SshPoolFactory seeds user/port/first key from deploy.yml) instead of a bare new SshPool() (root@22 + ssh-agent). New DeployAliasSshPoolSpec is a source-level guard, following the established Module.cfc spec pattern (Module extends modules.BaseModule, only resolvable at LuCLI runtime; the pool has no accessor through DeployServerCli): it bans any bare new modules.wheels.services.deploy.lib.SshPool( in Module.cfc and asserts both alias case-bodies call the factory. The factory's config-seeding behavior itself is pinned by the existing SshPoolFactorySpec.

Evidence

TDD red-first: 12 new specs, all failing before the fix, all green after:

  • DeployMainCliSpec: audit/details surface host-prefixed stdout; deploy brackets work with started/completed records (ordering pinned: acquire < started < pull, run < completed < release); rollback + setup-accessory records; failing audit record never fails the deploy.
  • DeployAppCliSpec: logs/containers surface host-prefixed stdout.
  • DeployLockCliSpec: status surfaces readlink stdout; stderr fallback when no lock is held.
  • DeployAliasSshPoolSpec (new): no bare SshPool in Module.cfc; both flat alias bodies call $deployBuildSshPool(opts.configPath).

CLI suite in the lucee7 docker harness (/wheels/cli/tests?format=json):

run pass fail error
baseline (red run, develop + new specs) 1006 12 (the 12 new specs) 2
after fix 1018 0 2

The 2 errors are the known docker-env artifacts (SshClientSpec/SshPoolSpec docker-not-found inside the harness container) — present on the baseline too.

Real SSH / docker-remote behavior is unverifiable in this harness — FakeSshPool specs + the dry-run flows are the verification bar here, per the campaign convention. The env-gated E2EDeploySpec (DEPLOY_E2E=1) exercises the same $deploy() path over real sshd and is unaffected (audit records are plain echo >> /tmp/kamal-audit.log appends through the shim-logged shell).

🤖 Generated with Claude Code

…t-alias ssh config (#2957)

Wave 3 of the #2957 deploy-orchestration campaign (observability + ssh aliases).

DEP-6a: the dispatch closures dropped every ssh.run() result, so read verbs
(audit, details, app logs/details/containers/images, lock status) returned
only host-count summaries in live mode. Add a sequential collecting
dispatcher that surfaces remote stdout host-prefixed ([host] line), with
stderr fallback for tolerated failures (lock status on a missing lock).

DEP-6b: AuditorCommands.record() had zero call sites, so 'wheels deploy
audit' tailed /tmp/kamal-audit.log — a file this tool never wrote. Deploy
now brackets the work with started/completed records (after lock acquire,
before lock release), rollback records the target version, and setup
records each accessory boot. All records dispatch allowFail so a
read-only-fs or similar can never fail the deploy.

DEP-7: the flat 'bootstrap'/'exec' aliases — the only CLI-reachable form
(#2677 picocli shadowing) — constructed a bare 'new SshPool()' that ignored
deploy.yml's ssh: block (user/port/keys -> root@host:22 + ssh-agent). They
now route through $deployBuildSshPool(opts.configPath) like every other
verb; a source-level guard spec bans bare SshPool construction in
Module.cfc.

TDD red-first: 12 new specs (FakeSshPool + source inspection), all red
before the fix, green after. CLI suite in the lucee7 docker harness:
1018 pass / 0 fail / 2 known docker-env errors (SshClientSpec/SshPoolSpec
docker-not-found), up from the 1006-pass baseline.

Signed-off-by: Peter Amiri <peter@alurium.com>
@bpamiri bpamiri enabled auto-merge (squash) June 12, 2026 20:08
@github-actions github-actions Bot added the docs label Jun 12, 2026
@bpamiri bpamiri merged commit c363194 into develop Jun 12, 2026
7 checks passed
@bpamiri bpamiri deleted the peter/deploy-w3-observability branch June 12, 2026 20:11

@wheels-bot wheels-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot — Reviewer

TL;DR: Wave 3 of the #2957 deploy campaign — read verbs now surface host-prefixed remote output (DEP-6a), the audit log is actually written (DEP-6b), and the flat bootstrap/exec aliases seed their SSH pool from deploy.yml (DEP-7). I verified every claim against the source and found no blocking issues: the closure-collection pattern is correct, the audit bracketing matches the stated ordering, Module.cfc is clean of bare pools, and the 12 new specs exercise both happy and failure paths. Verdict: comment — two non-blocking suggestions below.

Verified (evidence)

  • Closure safety. All three collect helpers (DeployMainCli.cfc:339, DeployAppCli.cfc:139, DeployLockCli.cfc:99) capture cmd/doRaise into locals and accumulate via a shared struct (var ctx = {lines: []}) — the exact pattern CLAUDE.md anti-pattern #10 prescribes, mirroring the existing $dispatch helpers.
  • sequential is genuinely required, not just stylistic. SshPool.onEach (SshPool.cfc:57) submits to a thread-pool executor, so appending to a shared CFML array from its callbacks would race; SshPool.sequential (SshPool.cfc:119) is a plain serial loop. The choice is sound and the doc comments say why.
  • Audit bracketing matches the claim. started record dispatched after lock acquire / before pull (DeployMainCli.cfc:110), completed after the role loop / before the finally lock release (DeployMainCli.cfc:139), both allowFail=true. Rollback (DeployMainCli.cfc:209) and setup accessory boots likewise. The ordering spec in DeployMainCliSpec.cfc pins acquire < started < pull and run < completed < release.
  • The "failing audit never fails deploy" spec is meaningful. FakeSshPool.run honors the opts.raise contract (FakeSshPool.cfc:99-101), so the exitCode: 1 expectation would throw if the call site weren't allowFail.
  • DEP-7 ban check holds. Post-fix, Module.cfc contains zero new modules.wheels.services.deploy.lib.SshPool( occurrences and 11 $deployBuildSshPool(opts.configPath) call sites; the flat case "bootstrap":/case "exec": labels precede the nested server branch, so DeployAliasSshPoolSpec.$caseBody's first-occurrence slicing targets the right bodies. Source-level Module.cfc inspection has 7 prior-art specs (e.g. McpHiddenToolsSpec, ConsoleCommandSpec).
  • Dry-run behavior preserved. Each $dispatchCollect dry-run branch appends the same [host] cmd lines to dryRunBuffer as the $dispatch it replaces (including $dispatchAnyCollect writing only hosts[1], matching $dispatchAny's any-one-host semantics at DeployLockCli.cfc:80-85).

Suggestions (non-blocking)

  1. audit verb still hard-fails on a host that has never deployed. DeployMainCli.cfc:266$dispatchCollect(hosts, cmd, dryRun) leaves allowFail at its false default, so tail on a missing /tmp/kamal-audit.log raises Wheels.Deploy.RemoteExecutionFailed and aborts mid-fleet. This is pre-existing behavior (the old $dispatch raised too), but this PR gave lock status the nicer treatment — allowFail=true + stderr fallback (DeployLockCli.cfc:54) — and the same would suit audit: a fresh host would show tail: cannot open '/tmp/kamal-audit.log' as advisory output instead of an exception. Fine as a follow-up.
  2. Blank lines in remote output are dropped. listToArray(text, chr(10)) skips empty list elements, so blank lines in e.g. app logs output silently disappear (DeployMainCli.cfc:361, mirrored in the other two helpers). Cosmetic for docker ps-style output; if log fidelity ever matters, listToArray(text, chr(10), true) preserves them.

Tests

12 new specs, red-first per the PR evidence table (1006 → 1018 pass, 0 fail). Coverage spans happy path (stdout surfaced host-prefixed), failure path (stderr fallback on nonzero exit, failing audit record tolerated), ordering (bracket records vs. lock/pull/run), and a structural regression guard for DEP-7. All extend the established wheels.wheelstest.system.BaseSpec deploy-spec pattern with FakeSshPool.

Docs & Commits

  • Changelog fragment changelog.d/deploy-w3-observability.fixed.md present with valid fixed type — no direct CHANGELOG.md edit. ✓
  • .ai/wheels/deploy.md stays accurate (its wheels deploy audit line — "tail /tmp/kamal-audit.log on each server" — is now more true, since the file finally gets written).
  • Single commit, header fix(cli): … at 95 chars, type/scope valid per commitlint.config.js, DCO Signed-off-by present, body explains the why. ✓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant